Generative Adversarial Networks (GAN)


By Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

Table of Contents

Source

  • CS231n: CNN for Visual Recognition

1. Discriminative Model v.s. Generative Model

  • Discriminative model




  • Cenerative model



2. Density Function Estimation

  • Probability
  • What if $x$ is actual images in the training data? At this point, $x$ can be represented as a (for example) $64\times 64 \times 3$ dimensional vector.
    • the following images are some realizations (samples) of $64\times 64 \times 3$ dimensional space
  • Probability density function estimation problem
  • If $P_{\text{model}}(x)$ can be estimated as close to $P_{\text{data}}(x)$, then data can be generated by sampling from $P_{\text{model}}(x)$.

    • Note: Kullback–Leibler Divergence is a kind of distance measure between two distributions
  • Learn determinstic transformation via a neural network
    • Start by sampling the code vector $z$ from a simple, fixed distribution such as a uniform distribution or a standard Gaussian $\mathcal{N}(0,I)$
    • Then this code vector is passed as input to a deterministic generator network $G$, which produces an output sample $x=G(z)$
    • This is how a neural network plays in a generative model (as a nonlinear mapping to a target probability density function)



  • An example of a generator network which encodes a univariate distribution with two different modes



  • Generative model of high dimensional space
  • Generative model of images
    • learn a function which maps independent, normally-distributed $z$ values to whatever latent variables might be needed to the model, and then map those latent variables to $x$ (as images)
    • first few layers to map the normally distributed $z$ to the latent values
    • then, use later layers to map those latent values to an image



3. Generative Adversarial Networks (GAN)

  • In generative modeling, we'd like to train a network that models a distribution, such as a distribution over images.

  • GANs do not work with any explicit density function !

  • Instead, take game-theoretic approach

3.1. Adversarial Nets Framework

  • One way to judge the quality of the model is to sample from it.

  • Model to produce samples which are indistinguishable from the real data, as judged by a discriminator network whose job is to tell real from fake





  • The idea behind Generative Adversarial Networks (GANs): train two different networks


  • Discriminator network: try to distinguish between real and fake data


  • Generator network: try to produce realistic-looking samples to fool the discriminator network


3.2. Objective Function of GAN

  • Think about a logistic regression classifier (or cross entropy loss $(h(x),y)$)


$$\text{loss} = -y \log h(x) - (1-y) \log (1-h(x))$$

  • To train the discriminator


  • To train the generator


  • Non-Saturating Game when the generator is trained

  • Early in learning, when $G$ is poor, $D$ can reject samples with high confidence because they are clearly different from the training data. In this case, $\log(1-D(G(z)))$ saturates.



  • Rather than training $G$ to minimize $\log(1-D(G(z)))$ we can train $G$ to maximize $\log D(G(z))$. This objective function provides much stronger gradients early in learning.

3.3. Soving a MinMax Problem


Step 1: Fix $G$ and perform a gradient step to


$$\max_{D} E_{x \sim p_{\text{data}}(x)}\left[\log D(x)\right] + E_{z \sim p_{z}(z)}\left[\log (1-D(G(z)))\right]$$

Step 2: Fix $D$ and perform a gradient step to


$$\max_{G} E_{z \sim p_{z}(z)}\left[\log D(G(z))\right]$$

OR



Step 1: Fix $G$ and perform a gradient step to


$$\min_{D} E_{x \sim p_{\text{data}}(x)}\left[-\log D(x)\right] + E_{z \sim p_{z}(z)}\left[-\log (1-D(G(z)))\right]$$

Step 2: Fix $D$ and perform a gradient step to


$$\min_{G} E_{z \sim p_{z}(z)}\left[-\log D(G(z))\right]$$

4. GAN with MNIST

4.1. GAN Implementation

In [2]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [3]:
(train_x, train_y), _ = tf.keras.datasets.mnist.load_data()

train_x = train_x[np.where(train_y == 2)]
train_x = train_x/255.0
train_x = train_x.reshape(-1, 784)

print('train_iamges :', train_x.shape)
train_iamges : (5958, 784)
In [4]:
generator = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 256, activation = 'relu', input_dim = 100),
    tf.keras.layers.Dense(units = 784, activation = 'sigmoid')
])
In [5]:
discriminator = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 256, activation = 'relu', input_dim = 784),
    tf.keras.layers.Dense(units = 1, activation = 'sigmoid'),
])
In [6]:
discriminator.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0001),
                      loss = 'binary_crossentropy')
In [7]:
discriminator.trainable = False

combined_input = tf.keras.layers.Input(shape = (100,))
generated = generator(combined_input)
combined_output = discriminator(generated)

combined = tf.keras.models.Model(inputs = combined_input, outputs = combined_output)
In [8]:
combined.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0002),
                 loss = 'binary_crossentropy')
In [9]:
def make_noise(samples):
    return np.random.normal(0, 1, [samples, 100])
In [10]:
def plot_generated_images(generator, samples = 3):

    noise = make_noise(samples)

    generated_images = generator.predict(noise)
    generated_images = generated_images.reshape(samples, 28, 28)

    for i in range(samples):
        plt.subplot(1, samples, i+1)
        plt.imshow(generated_images[i], 'gray', interpolation = 'nearest')
        plt.axis('off')
        plt.tight_layout()

    plt.show()

Step 1: Fix $G$ and perform a gradient step to

$$\min_{D} E_{x \sim p_{\text{data}}(x)}\left[-\log D(x)\right] + E_{x \sim p_{z}(z)}\left[-\log (1-D(G(z)))\right]$$

Step 2: Fix $D$ and perform a gradient step to

$$\min_{G} E_{x \sim p_{z}(z)}\left[-\log D(G(z))\right]$$
In [11]:
n_iter = 20000
batch_size = 100

fake = np.zeros(batch_size)
real = np.ones(batch_size)

for i in range(n_iter):

    # Train Discriminator
    noise = make_noise(batch_size)
    generated_images = generator.predict(noise, verbose = 0)

    idx = np.random.randint(0, train_x.shape[0], batch_size)
    real_images = train_x[idx]

    D_loss_real = discriminator.train_on_batch(real_images, real)
    D_loss_fake = discriminator.train_on_batch(generated_images, fake)
    D_loss = D_loss_real + D_loss_fake

    # Train Generator
    noise = make_noise(batch_size)
    G_loss = combined.train_on_batch(noise, real)

    if i % 5000 == 0:

        print('Discriminator Loss: ', D_loss)
        print('Generator Loss: ', G_loss)

        plot_generated_images(generator)
Discriminator Loss:  1.873877376317978
Generator Loss:  0.338532954454422
1/1 [==============================] - 0s 16ms/step
Discriminator Loss:  0.18016396462917328
Generator Loss:  2.5140249729156494
1/1 [==============================] - 0s 15ms/step
Discriminator Loss:  0.44247880578041077
Generator Loss:  2.2012200355529785
1/1 [==============================] - 0s 15ms/step
Discriminator Loss:  0.5620711445808411
Generator Loss:  2.061182975769043
1/1 [==============================] - 0s 16ms/step

4.2. After Training

  • After training, use the generator network to generate new data


In [12]:
plot_generated_images(generator)
1/1 [==============================] - 0s 27ms/step

5. Conditional GAN

  • In an unconditioned generative model, there is no control on modes of the data being generated.
  • In the Conditional GAN (CGAN), the generator learns to generate a fake sample with a specific condition or characteristics (such as a label associated with an image or more detailed tag) rather than a generic sample from unknown noise distribution.




  • Simple modification to the original GAN framework that conditions the model on additional information for better multi-modal learning
  • Many practical applications of GANs when we have explicit supervision available
In [14]:
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
In [15]:
(train_x, train_y), (test_x, test_y) = tf.keras.datasets.mnist.load_data()

train_x, test_x = train_x/255.0 , test_x/255.0
train_x, test_x = train_x.reshape(-1,784), test_x.reshape(-1,784)

train_y = tf.keras.utils.to_categorical(train_y, num_classes = 10)
test_y = tf.keras.utils.to_categorical(test_y, num_classes = 10)

print('train_x: ', train_x.shape)
print('test_x: ', test_x.shape)
print('train_y: ', train_y.shape)
print('test_y: ', test_y.shape)
train_x:  (60000, 784)
test_x:  (10000, 784)
train_y:  (60000, 10)
test_y:  (10000, 10)
In [16]:
generator_model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 256, activation = 'relu', input_dim = 138),
    tf.keras.layers.Dense(units = 784, activation = 'sigmoid')
])

noise = tf.keras.layers.Input(shape = (128,))
label = tf.keras.layers.Input(shape = (10,))

model_input = tf.keras.layers.concatenate([noise, label], axis = 1)
generated_image = generator_model(model_input)

generator = tf.keras.models.Model(inputs = [noise, label], outputs = generated_image)
In [17]:
generator.summary()
Model: "model_3"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_7 (InputLayer)           [(None, 128)]        0           []                               
                                                                                                  
 input_8 (InputLayer)           [(None, 10)]         0           []                               
                                                                                                  
 concatenate_2 (Concatenate)    (None, 138)          0           ['input_7[0][0]',                
                                                                  'input_8[0][0]']                
                                                                                                  
 sequential_2 (Sequential)      (None, 784)          237072      ['concatenate_2[0][0]']          
                                                                                                  
==================================================================================================
Total params: 237,072
Trainable params: 237,072
Non-trainable params: 0
__________________________________________________________________________________________________
In [18]:
discriminator_model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 256, activation = 'relu', input_dim = 794),
    tf.keras.layers.Dense(units = 1, activation = 'sigmoid')
])

input_image = tf.keras.layers.Input(shape = (784,))
label = tf.keras.layers.Input(shape = (10,))

model_input = tf.keras.layers.concatenate([input_image, label], axis = 1)
validity = discriminator_model(model_input)

discriminator = tf.keras.models.Model(inputs = [input_image, label], outputs = validity)
In [19]:
discriminator.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0002),
                      loss = ['binary_crossentropy'])
In [20]:
discriminator.summary()
Model: "model_4"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_9 (InputLayer)           [(None, 784)]        0           []                               
                                                                                                  
 input_10 (InputLayer)          [(None, 10)]         0           []                               
                                                                                                  
 concatenate_3 (Concatenate)    (None, 794)          0           ['input_9[0][0]',                
                                                                  'input_10[0][0]']               
                                                                                                  
 sequential_3 (Sequential)      (None, 1)            203777      ['concatenate_3[0][0]']          
                                                                                                  
==================================================================================================
Total params: 203,777
Trainable params: 203,777
Non-trainable params: 0
__________________________________________________________________________________________________
In [21]:
discriminator.trainable = False

noise = tf.keras.layers.Input(shape = (128,))
label = tf.keras.layers.Input(shape = (10,))

generated_image = generator([noise, label])
validity = discriminator([generated_image, label])

combined = tf.keras.models.Model(inputs = [noise, label], outputs = validity)
In [22]:
combined.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0002),
                      loss = ['binary_crossentropy'])
In [23]:
combined.summary()
Model: "model_5"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_11 (InputLayer)          [(None, 128)]        0           []                               
                                                                                                  
 input_12 (InputLayer)          [(None, 10)]         0           []                               
                                                                                                  
 model_3 (Functional)           (None, 784)          237072      ['input_11[0][0]',               
                                                                  'input_12[0][0]']               
                                                                                                  
 model_4 (Functional)           (None, 1)            203777      ['model_3[0][0]',                
                                                                  'input_12[0][0]']               
                                                                                                  
==================================================================================================
Total params: 440,849
Trainable params: 237,072
Non-trainable params: 203,777
__________________________________________________________________________________________________
In [24]:
def create_noise(samples):
    return np.random.normal(0, 1, [samples, 128])
In [25]:
def plot_generated_images(generator):

    noise = create_noise(10)
    label = np.arange(0, 10).reshape(-1, 1)
    label_onehot = np.eye(10)[label.reshape(-1)]

    generated_images = generator.predict([noise, label_onehot])

    plt.figure(figsize = (12, 3))
    for i in range(generated_images.shape[0]):
        plt.subplot(1, 10, i + 1)
        plt.imshow(generated_images[i].reshape((28, 28)), 'gray', interpolation = 'nearest')
        plt.title('Digit: {}'.format(i))
        plt.axis('off')

    plt.show()
In [ ]:
n_iter = 30000
batch_size = 50

valid = np.ones(batch_size)
fake = np.zeros(batch_size)

for i in range(n_iter):

    # Train Discriminator
    idx = np.random.randint(0, train_x.shape[0], batch_size)
    real_images, labels = train_x[idx], train_y[idx]

    noise = create_noise(batch_size)
    generated_images = generator.predict([noise,labels], verbose = 0)

    d_loss_real = discriminator.train_on_batch([real_images, labels], valid)
    d_loss_fake = discriminator.train_on_batch([generated_images, labels], fake)
    d_loss = d_loss_real + d_loss_fake

    # Train Generator
    noise = create_noise(batch_size)
    labels = np.random.randint(0, 10, batch_size)
    labels_onehot = np.eye(10)[labels]

    g_loss = combined.train_on_batch([noise, labels_onehot], valid)

    if i % 5000 == 0:

        print('Discriminator Loss: ', d_loss)
        print('Generator Loss: ', g_loss)

        plot_generated_images(generator)
Discriminator Loss:  1.5396462678909302
Generator Loss:  0.8892247676849365
1/1 [==============================] - 0s 26ms/step
Discriminator Loss:  0.09676432050764561
Generator Loss:  4.2714643478393555
1/1 [==============================] - 0s 25ms/step
Discriminator Loss:  0.08658699691295624
Generator Loss:  5.2906494140625
1/1 [==============================] - 0s 25ms/step
Discriminator Loss:  0.21492188423871994
Generator Loss:  4.767454147338867
1/1 [==============================] - 0s 40ms/step
Discriminator Loss:  1.4001021981239319
Generator Loss:  1.8565527200698853
1/1 [==============================] - 0s 41ms/step
Discriminator Loss:  0.19797339290380478
Generator Loss:  4.865542888641357
1/1 [==============================] - 0s 28ms/step

6. Other Tutorials

In [2]:
%%html
<center><iframe src="https://www.youtube.com/embed/9JpdAg6uMXs?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
  • CS231n: CNN for Visual Recognition
In [3]:
%%html
<center><iframe src="https://www.youtube.com/embed/5WoItGTWV54?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

MIT by Aaron Courville

In [4]:
%%html
<center><iframe src="https://www.youtube.com/embed/JVb54xhEw6Y?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [5]:
%%html
<center><iframe src="https://www.youtube.com/embed/odpjk7_tGY0?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [1]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')